3 research outputs found

    A teachable semi-automatic web information extraction system based on evolved regular expression patterns

    Get PDF
    This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages. Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements

    An evolution of a complete program using XML-based grammar definition

    Get PDF
    XML technology is a technique to describe structured data that can be manipulated by different types of applications, especially to represent content on the Web. This paper presents a viable approach to automatically evolve a ‘sorting program’ by applying genetic programming and full syntax XML-based grammar definition to map the genotype to phenotype. The genotypes are composed of fixed-length blocks of genes that are made up of a series of integer values. The paper reports that our approach improves the structure of the grammar used in the mapping process, which guarantees that the generated program follows the correct syntax with no repair function, in comparison to earlier work. This allows more structured programs than earlier systems

    A stepwise evolution of functions

    Get PDF
    A Genotype-Phenotype mapping in most Genetic Programming (GP) systems uses a predefined and rigid grammar definition. This method has been successful in producing the required solution. However, it can only be used to solve a limited set of problems. In this paper, a Teachable GP (TGP) system is proposed. An external GP system evolves a complete computer program, which acceptable solution is then added automatically to the existing grammar definition as a function and made available to the TGP system. This dynamic grammar definition allows for a more complex program to be generated, solving more complex problems. Experiments are performed to compare performances between GP without the added function, GP with a user-defined function and GP with the evolved function and results shows that GP with an evolved function is comparable to the GP with user-defined function and outperformed GP without function
    corecore